Support eventstats command with Calcite#3585
Support eventstats command with Calcite#3585LantaoJin merged 10 commits intoopensearch-project:mainfrom
eventstats command with Calcite#3585Conversation
Signed-off-by: Lantao Jin <ltjin@amazon.com>
|
|
||
| private final UnresolvedExpression windowFunction; | ||
|
|
||
| private final List<UnresolvedExpression> windowFunctionList; |
There was a problem hiding this comment.
do we need to add window boundary? For example, I want to execute a ppl equal to a sql like
select AVG(SUM(total_amount)) OVER (ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS moving_avg_3day (this is one of the moving avg, which we plan to support for t2visbuilder).
There was a problem hiding this comment.
It's easy to support it. but not this PR of eventstats command. We can extend this syntax after PPL lang unified. e.g support boundary when we implement streamstats command.
There was a problem hiding this comment.
I have added a WindowFrame for further using.
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
Outdated
Show resolved
Hide resolved
| * ``eventstats``: Useful when you need to enrich events with statistical context for further analysis or filtering. Can be used mid-search to add statistics that can be used in subsequent commands. | ||
|
|
||
|
|
||
| Version |
There was a problem hiding this comment.
nice!
Could we highlight version info? for instance, in title, evenstats (since 3.1.0). or maybe using version controled doc? Any thoughts? @dai-chen @qianheng-aws
There was a problem hiding this comment.
highlighted in description and docs/user/ppl/index.rst
|
|
||
| Example:: | ||
|
|
||
| PPL> source=accounts | eventstats sum(age) by gender; |
There was a problem hiding this comment.
add an example explain how to handle missing value?
| if (BuiltinFunctionName.AVG == agg.get()) { | ||
|
|
||
| } else { | ||
| } |
| rows("John", "Canada", "Ontario", 4, 2023, 25, 4, 36.25, 20, 70), | ||
| rows("Jake", "USA", "California", 4, 2023, 70, 4, 36.25, 20, 70), | ||
| rows("Jane", "Canada", "Quebec", 4, 2023, 20, 4, 36.25, 20, 70), | ||
| rows("Hello", "USA", "New York", 4, 2023, 30, 4, 36.25, 20, 70)); |
There was a problem hiding this comment.
in scheam, 2nd column is age, but in result, 2nd column is country?
There was a problem hiding this comment.
The data order is the actually schema order. Let me change to verifySchemaInOrder
Signed-off-by: Lantao Jin <ltjin@amazon.com>
Signed-off-by: Lantao Jin <ltjin@amazon.com>
|
@dai-chen please take another look. |
|
|
||
| static RexNode makeOver( | ||
| CalcitePlanContext context, | ||
| String name, |
There was a problem hiding this comment.
[minor] Why not just pass in BuiltinFunctionName here?
| RexWindowBound lowerBound = convert(context, windowFrame.getLower()); | ||
| RexWindowBound upperBound = convert(context, windowFrame.getUpper()); | ||
| switch (functionName) { | ||
| // There is no "avg" AggImplementor in Calcite, we have to change avg window |
There was a problem hiding this comment.
There is a Calcite rule AggregateReduceFunctionsRule, it will do such transformation of rewriting avg to sum/count. Could we just use avg here and leverage that rule for transformation? It should introduce more refined handling for cases like NULL values.
There was a problem hiding this comment.
Thanks for informing this, let me check how to apply this rule.
There was a problem hiding this comment.
After checking AggregateReduceFunctionsRule code, seems it could be applied by logical Aggregate, rather than logical Window. The converting made by SqlNode level (parser). We can open a follow-up ticket about self-dev rule for window.
| } | ||
| } | ||
|
|
||
| public void pushWindowPartitions(List<RexNode> partition) { |
There was a problem hiding this comment.
Seems never used for these 3 APIs. And why should the window related context stored in this global context?
There was a problem hiding this comment.
Will delete them. It is useless code.
| .map( | ||
| groupCtx -> | ||
| (UnresolvedExpression) | ||
| new Alias( |
There was a problem hiding this comment.
[question] Why do we always need to wrap byClause with Alias here?
There was a problem hiding this comment.
could be a followup and it needs much code refactor for stats, not sure the original reason.
Signed-off-by: Lantao Jin <ltjin@amazon.com>
Signed-off-by: Lantao Jin <ltjin@amazon.com>
Signed-off-by: Lantao Jin <ltjin@amazon.com>
* Support eventstats command with Calcite Signed-off-by: Lantao Jin <ltjin@amazon.com> * add doc Signed-off-by: Lantao Jin <ltjin@amazon.com> * fix IT Signed-off-by: Lantao Jin <ltjin@amazon.com> * address comments Signed-off-by: Lantao Jin <ltjin@amazon.com> * Fix conflicts Signed-off-by: Lantao Jin <ltjin@amazon.com> * address comments Signed-off-by: Lantao Jin <ltjin@amazon.com> * Support variance functions Signed-off-by: Lantao Jin <ltjin@amazon.com> * fix UT Signed-off-by: Lantao Jin <ltjin@amazon.com> * update doc Signed-off-by: Lantao Jin <ltjin@amazon.com> --------- Signed-off-by: Lantao Jin <ltjin@amazon.com> Signed-off-by: xinyual <xinyual@amazon.com>
Description
Support
eventstatscommand with CalciteThe
eventstatscommand actually equals to window function in SQLequals to
And the default window frame is
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWINGIn this PR, the supported window functions are
We will support more window functions in follow-up PRs since they are not supported in PPL-Spark
Related Issues
Resolves #3563
Check List
--signoff.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.